Instance-Based Learning Techniques of Unsupervised Feature Weighting Do not Perform So Badly!

نویسندگان

  • Héctor Núñez
  • Miquel Sànchez-Marrè
چکیده

1 Knowledge Engineering & Machine Learning Group, Technical University of Catalonia, Barcelona, email: {hnunez, miquel}@lsi.upc.es Abstract. The major hypothesis that we will be prove in this paper is that unsupervised learning techniques of feature weighting are not significantly worse than supervised methods, as is commonly believed in the machine learning community. This paper tests the power of unsupervised feature weighting techniques for predictive tasks within several domains. The paper analyses several unsupervised and supervised feature weighting techniques, and proposes new unsupervised feature weighting techniques. Two unsupervised entropy-based weighting algorithms are proposed and tested against all other techniques. The techniques are evaluated in terms of predictive accuracy on unseen instances, measured by a ten-fold cross-validation process. The testing has been done using thirty-four data sets from the UCI Machine Learning Database Repository and other sources. Unsupervised weighting methods assign weights to attributes without any knowledge about class labels, so this task is considerably more difficult. It has commonly been assumed that unsupervised methods would have a substantially worse performance than supervised ones, as they do not use any domain knowledge to bias the process. The major result of the study is that unsupervised methods really are not so bad. Moreover, one of the new unsupervised learning method proposals has shown a promising behaviour when faced against domains with many irrelevant features, reaching similar performance as some of the supervised methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting kernel-based feature weighting and instance clustering to transfer knowledge across domains

Learning invariant features across domains is of vital importance to unsupervised domain adaptation, where classifiers trained on the training examples (source domain) need to adapt to a different set of test examples (target domain) in which no labeled examples are available. In this paper, we propose a novel approach to find the invariant features in the original space and transfer the knowle...

متن کامل

A Supervised Method of Feature Weighting for Measuring Semantic Relatedness

The clustering of related words is crucial for a variety of Natural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have similar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the m...

متن کامل

Cluster-Dependent Feature Selection through a Weighted Learning Paradigm

This paper addresses the problem of selecting a subset of the most relevant features from a dataset through a weighted learning paradigm. We propose two automated feature selection algorithms for unlabeled data. In contrast to supervised learning, the problem of automated feature selection and feature weighting in the context of unsupervised learning is challenging, because label information is...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Unsupervised Feature Learning via Non-Parametric Instance Discrimination

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether this observation can be extended beyond the conventional domain of supervised learning: Can we learn a good feature representation that captures apparent similarity among instances, instead of classes, by merely asking ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004